Computationally Inspired Biotechnologies: Improved DNA Synthesis and Associative Search Using Error-Correcting Codes and Vector-Quantization
نویسندگان
چکیده
The main theme of this paper is to take inspiration from methods used in computer science and related disciplines, and to apply these to develop improved biotechnology. In particular, our proposed improvements are made by adapting various information theoretic coding techniques which originate in computational and information processing disciplines, but which we re-tailor to work in the biotechnology context. (a) We apply Error-Correcting Codes, developed to correct transmission errors in electronic media, to decrease (in certain contexts, optimally) error rates in optically-addressed DNA synthesis (e.g., of DNA chips). (b) We apply Vector-Quantization (VQ) Coding techniques (which were previously used to cluster, quantize, and compress data such as speech and images) to improve I/O rates (in certain contexts, optimally) for transformation of electronic data to and from DNA with bounded error. (c) We also apply VQ Coding techniques, some of which hierarchically cluster the data, to improve associative search in DNA databases by reducing the problem to that of exact aanity separation. These improvements in biotechnology appear to have some general applicability beyond biomolecular computing. As a motivating example, this paper improves biotechnology methods to do associative search in DNA databases. Baum B95] previously proposed the use of biotechnology aanity methods (DNA annealing) to do massively parallel associative search in large databases encoded as DNA strands, but many remaining issues were not developed. Using in part our improved biotechnology techniques based on Error-Correction and VQ Coding, we develop detailed procedures for the following tasks: (i) The database may initially be in conventional (electronic, magnetic, or optical) media, rather than the form of DNA strands. For input and output (I/O) to and from conventional media, we apply DNA chip technology improved by Error-Correction and VQ Coding methods for error-correction and compression. ? A postscript version of this paper is at URL (ii) The query may not be an exact match or even partial match with any data in the database, but since DNA annealing aanity methods work best for these cases, we apply various VQ Coding methods for reening the associative search to exact matches. (iii) We also brieey discuss how to extend associative search queries in DNA databases to more sophisticated hybrid queries that include also Boolean formula conditionals with a bounded number of Boolean variables , by combining our methods for DNA associative search with known BMC methods for solving small size SAT problems. For example, these extended queries could …
منابع مشابه
One-point Goppa Codes on Some Genus 3 Curves with Applications in Quantum Error-Correcting Codes
We investigate one-point algebraic geometric codes CL(D, G) associated to maximal curves recently characterized by Tafazolian and Torres given by the affine equation yl = f(x), where f(x) is a separable polynomial of degree r relatively prime to l. We mainly focus on the curve y4 = x3 +x and Picard curves given by the equations y3 = x4-x and y3 = x4 -1. As a result, we obtain exact value of min...
متن کاملError-correcting output codes based ensemble feature extraction
This paper proposes a novel feature extraction method based on ensemble learning. Using the errorcorrecting output codes (ECOC) to design binary classifiers (dichotomizers) for separating subsets of classes, the outputs of the dichotomizers are linear or nonlinear features that provide powerful separability in a new space. In this space, the vector quantization based meta classifier can be view...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملOn the synthesis of DNA error correcting codes
DNA error correcting codes over the edit metric consist of embeddable markers for sequencing projects that are tolerant of sequencing errors. When a genetic library has multiple sources for its sequences, use of embedded markers permit tracking of sequence origin. This study compares different methods for synthesizing DNA error correcting codes. A new code-finding technique called the salmon al...
متن کاملAngular Quantization-based Binary Codes for Fast Similarity Search
This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000